Kernel-based learning methods revolve around the notion of a Gram matrix between data points. These square, symmetric, positive semi-definite matrices can informally be regarded as encoding pairwise similarity between all of the objects in a data-set. In this talk I present an algorithm for manipulating the diagonal entries of a kernel matrix using semi-definite programming. Kernel matrix diagonal dominance reduction attempts to deal with the problem of learning with almost orthogonal features, a phenomenon commonplace in kernel matrices derived from string kernels or Gaussian kernels with small width parameter. I will show how this task can be formulated as a semi-definite programming optimization problem that can be solved with readily available optimizers. Theoretically I will provide an analysis using Rademacher based bounds to provide an alternative motivation for the 1-norm SVM motivated from kernel diagonal reduction.
Joint work with Thore Graepel (Microsoft Research, Cambridge UK) and John Shawe-Taylor (University of Southampton, UK)